Skip to content

Conversation

pacoxu
Copy link
Member

@pacoxu pacoxu commented Oct 9, 2025

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels Oct 9, 2025
@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 9, 2025
@pacoxu
Copy link
Member Author

pacoxu commented Oct 9, 2025

/cc @ruiwen-zhao
/assign @wojtek-t @SergeyKanzhelev

@k8s-ci-robot
Copy link
Contributor

@pacoxu: GitHub didn't allow me to request PR reviews from the following users: ruiwen-zhao.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @ruiwen-zhao
/assign @wojtek-t @SergeyKanzhelev

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Member

@SergeyKanzhelev SergeyKanzhelev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 9, 2025

#### GA

- Change the default value of `serialize-image-pulls` to false and set the default value of `maxParallelImagePulls` to 2.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm - do we really want to change the default for "GA"?

With https://github.com/kubernetes/enhancements/tree/master/keps/sig-architecture/5241-beta-featuregate-promotion-requirements, we generally want the GA to effectively be kind of "no-op". Changing the default might be a bit unexpected here.

I know that it doesn't explicitly affect the user (it may affect them implicitly because some pods startup (due to image pulling) may be slower/faster), but still it might not be intuitive.

Let me ping other PRR approvers about it for their thoughts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, should we make this step as beta-2 for this KEP to change the default?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I thought about it and we discussed that a bit with other PRR approvers. Based on that discussion I have few questions before we make a final decision:

  1. My understanding was that before this KEP, the behavior was that the default behavior was "unbounded parallel image pulls".
    But now looking into this proposal I actually see this sentence: "Before this proposal, serialize-image-pulls is by default true", which suggests that this isn't true.
    So either my understanding was incorrect or this sentence is not true or I don't understand the semantics of serialize-image-pull or something different.

Can you please clarify what is the current (before this proposal) default semantics?

  1. If indeed I was wrong and by default we were serializing image pulls by default, then switching the default seems like a "no-go" (no matter if it would be ga, another beta or anything else) - that just seems like a potentially breaking change to users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serialized pull was the default. We are making the default marginally better by allowing 2 parallel pulls to improve reliability (one bad image will not break the whole node).

This is not strictly required. Many installations override this default anyways as it is not reliable anyways. It is just nice to do if somebody just trying it out.

If PRR is blocked on this, let's just keep old defaults and forget about it.

Copy link
Member Author

@pacoxu pacoxu Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Previous default behavior: serialized pull. One image pulling may block a node pulling new images for new pods.
  2. Previous serialize-image-pull=false without maxParallelImagePulls set: unlimited parallel pull, even setting registryPullQPS and registryBurst(See https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/3673-kubelet-parallel-image-pull-limit/README.md#qpsburst-limits-on-kubelet-are-confusing). unlimited parallel pull may have too much IO pressure to disk.
  3. After this default value change, we want to allow 2 parallel pulls to improve reliability (one bad image will not break the whole node), as Sergey said.

I think 1 or 2 are not ideal default behavior and 3 would be a better default behavior, and this is a breaking change in some extend, but acceptable.

PS. 2 is a very conservative and cautious approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I would like to distinguish two things:

  1. Changing the default from serialized to parallel
  2. Changing the default for number of ParallelImagePulls if serialize is false from unbounded to 2.

In (1) we change the default behavior even for administrators that aren't aware of how image pulls work. I agree that it helps in some cases (blocking image pulls by a single pull), but otoh it may negatively affect existing usecases (if I have two large pulls at the same time, it's actually better to download the first one and only then start the second, rather than having the first one to last 2x longer).
So I don't think we should really change that default - the mitigation for administrator is to configure their setup with parallel pulls and they can do that with this feature.

In (2) - given that "serialize-image-pulls=false" is not a default, then administrators are already aware of it and configuring that. I agree that "unbounded" is bad and switching to MaxImagePulls=2 in such case (if someone didn't set it explicitly) seems helpful. I'm fine with this change, but the consensus among prr approvers is that it would have to be a second beta then.
[And I would actually go with this option.]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to just GA this KEP then. Dragging this longer for the default configuration update that most environments needs to fine tune anyways doesn't sound attractive.

Copy link
Member Author

@pacoxu pacoxu Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be 3 proposals:

  1. Change both default values: serializeImagePulls: false and maxParallelImagePulls: 2
  2. Changing the default for number of ParallelImagePulls if serialize is false from unbounded to 2.
  3. do nothing and GA.

@wojtek-t +1 for 2, and @SergeyKanzhelev +1 for 3.

IIRC, we have discussed on whether we should set a default value for maxParallelImagePulls during KEP initialization.

https://github.com/kubernetes/kubernetes/blob/42ee6dafd55e411a258f6e5d947b1fb95f38a6b5/pkg/kubelet/apis/config/v1beta1/defaults.go#L226-L235

	if obj.SerializeImagePulls == nil {
		// SerializeImagePulls is default to true when MaxParallelImagePulls
		// is not set, and false when MaxParallelImagePulls is set.
		// This is to save users from having to set both configs.
		if obj.MaxParallelImagePulls == nil || *obj.MaxParallelImagePulls < 2 {
			obj.SerializeImagePulls = ptr.To(true)
		} else {
			obj.SerializeImagePulls = ptr.To(false)
		}
	}

Currently, setting MaxParallelImagePulls=2 will enable parallel image pulling if SerializeImagePulls is not set.
So solution 2 may be something like below.

	if obj.SerializeImagePulls == nil {
		// SerializeImagePulls is default to true when MaxParallelImagePulls
		// is not set, and false when MaxParallelImagePulls is set.
		// This is to save users from having to set both configs.
		if obj.MaxParallelImagePulls == nil || *obj.MaxParallelImagePulls < 2 {
			obj.SerializeImagePulls = ptr.To(true)
		} else {
			obj.SerializeImagePulls = ptr.To(false)
		}
+	} else if !*obj.SerializeImagePulls && obj.MaxParallelImagePulls == nil  {
+		obj.MaxParallelImagePulls = ptr.To[int32](2)
+	}

I think I would +1 for do nothing and GA this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to do nothing and GA this KEP(adding the new maxParallelImagePulls configuration in kubelet, without a FG).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, setting MaxParallelImagePulls=2 will enable parallel image pulling if SerializeImagePulls is not set.

That makes sense. And I think that helps with what we need.
I'm ok with GA-ing as is given the above.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 14, 2025
Copy link
Member

@SergeyKanzhelev SergeyKanzhelev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 14, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mrunalp, pacoxu, SergeyKanzhelev
Once this PR has been reviewed and has the lgtm label, please ask for approval from wojtek-t. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants